FIRST (For Inspiration and Recognition of Science and Technology) Robotics is a nonprofit group that combines both the field of robotics and competition on a K-12 scale. I had the opportunity to be a part of a FIRST Robotics Competition (FRC) team, FIRST's program tier designed for high school students, during my 4 years in high school.
The structure of a normal FRC season is fairly straightforward: every January, a new game is released. Teams have until the end of February to build a robot (typically weighing around 125lbs) to accomplish tasks within the game. From the end of February until April, teams compete at events within their district/attend regional events to compete against other teams and seek to become the winner of the event. In a district system, teams seek to win events to earn district points that determine whether they can attend first attend their district championship, and from there, the FRC Championship in May. In a regional system, teams seek to win at that event, as it is a direct ticket to the FRC Championship.
At each event, teams compete against each other during qualifications in two alliances of 3 teams (namely, The Blue Alliance and The Red Alliance), and can earn up to 4 ranking points in a match: 2 for winning, and 2 for completing game specific tasks that vary from season to season. After qualifications, the Top 8 ranked teams select 2 other teams to join their final alliance, and the 8 alliances battle it out in quarterfinals, semifinals, and finals matches, until there is 1 alliance standing. While finals alliances consist of 3 teams initially, this can become a team of 4 alliances if one team breaks down and needs to be subbed in with a backup team. Thus, there can be up to 4 winning teams at each event.
In this tutorial, I will use FRC data to walk through the data science pipeline, and use machine learning to predict winners of an event from qualification data.
For the data, I made two distinct choices: 1) I focused on data from the 2019 season. This was the last "normal" season, as the season was cut short in 2020 due to COVID-19, competitions didn't occur in 2021, and different districts took on different event models for the 2022 season. As such, the 2019 season was selected for consistency. 2) Given that I participated in FRC for 4 years, I decided to focus on my home district, FIRST Chesapeake (CHS). This district has events that occur across Maryland and Virginia, which can be seen in the event codes for each respective event (besides the district championship event)
For the data, there were 2 sources I could retrieve data from: FRC Event Web, an API provided directly from FIRST, or through the API from The Blue Alliance (TBA). TBA provides data directly from the FRC Event Web API, as well as some additional statistics. From a visual perspective, TBA is also easier to navigate, and links match videos directly from YouTube to each match. From a data perspective, because of the added statistics, I decided to use the API from TBA.
import requests
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from scipy import stats
from sklearn.tree import DecisionTreeClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
import sklearn.model_selection as ms
import sklearn.metrics as met
# This is a token needed to read data from the API -- it is stored in a local file to protect the security of the token
with open('TBA_API_AUTH.txt', 'r') as file:
auth = file.read().replace('\n','')
headers = {'X-TBA-Auth-Key' : auth}
First, I needed to get the different events that happened in CHS in 2019.
events_request = requests.get("https://www.thebluealliance.com/api/v3/district/2019chs/events/keys", headers = headers)
json = events_request.json()
district_events = pd.DataFrame.from_dict(json)
district_events
| 0 | |
|---|---|
| 0 | 2019chcmp |
| 1 | 2019mdbet |
| 2 | 2019mdowi |
| 3 | 2019mdoxo |
| 4 | 2019vabla |
| 5 | 2019vagle |
| 6 | 2019vahay |
| 7 | 2019vapor |
From this list, we can see that there were 7 district events, along with the district championship. In order to demonstrate the curation process step by step, I will be demonstrating it first for the first district event, 2019mdbet. Afterward, I'll repeat the process for all the other events so that I can properly explore the data.
The one downside with the TBA API is that all of the information about events, teams, and scores are spread across various API calls. As such, I will be going through each and compiling the information together in one DataFrame.
For the first event, I need to gather what teams were at the event, as well some basic information about each team.
teams_request = requests.get("https://www.thebluealliance.com/api/v3/event/2019mdbet/teams", headers = headers)
json = teams_request.json()
teams_at_event = pd.DataFrame.from_dict(json)
teams_at_event
| address | city | country | gmaps_place_id | gmaps_url | key | lat | lng | location_name | motto | name | nickname | postal_code | rookie_year | school_name | state_prov | team_number | website | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | None | Edgewater | USA | None | None | frc1111 | None | None | None | None | NASA/The Power Hawks Robotics Club, Inc./Anne ... | Power Hawks Robotics | 21037 | 2003 | South River Senior High School | Maryland | 1111 | http://www.powerhawks.org |
| 1 | None | Fairfax | USA | None | None | frc1123 | None | None | None | None | Sony/AIM ROVER/Redeeming Grace Church&Neighbor... | AIM ⛟ Robotics | 22030 | 2003 | Neighborhood Group | Virginia | 1123 | http://1123.team |
| 2 | None | Bethesda | USA | None | None | frc1389 | None | None | None | None | Leidos/Domaine an RLAH Group/The Travel Fairy/... | The Body Electric | 20817 | 2004 | Walt Whitman High School | Maryland | 1389 | http://www.team1389.org |
| 3 | None | Washington | USA | None | None | frc1446 | None | None | None | None | Friendship Public Charter School/Bechtel Corpo... | Robo Knights | 20019 | 2004 | Friendship Pcs-Collegiate Acad | District of Columbia | 1446 | http://www.firstinspires.org/ |
| 4 | None | Lutherville Timonium | USA | None | None | frc1727 | None | None | None | None | Friends and Family of REX/Levis Family Foundat... | REX | 21093 | 2006 | Dulaney High School | Maryland | 1727 | https://www.dulaneyrobotics.org |
| 5 | None | Haymarket | USA | None | None | frc1885 | None | None | None | None | US STEM Foundation/Lockheed Martin/Macedon Tec... | ILITE Robotics | 20169 | 2006 | Battlefield High School | Virginia | 1885 | http://www.ilite.us |
| 6 | None | Washington | USA | None | None | frc1915 | None | None | None | None | NASA Headquarters/Bechtel/DC Public Schools/Go... | MTHS Firebird Robotics | 20002 | 2006 | Mckinley Tech High School | District of Columbia | 1915 | http://www.firstinspires.org/ |
| 7 | None | Chantilly | USA | None | None | frc2186 | None | None | None | None | BAE Systems/CACI/ICF/The Monachello Family/The... | Dogs of Steel | 20151 | 2007 | Westfield High School | Virginia | 2186 | http://www.dogsofsteel.org |
| 8 | None | Columbia | USA | None | None | frc2537 | None | None | None | None | Maryland State Department of Education/Marylan... | Space RAIDers | 21044 | 2008 | Atholton High School | Maryland | 2537 | http://www.team2537.com |
| 9 | None | Columbia | USA | None | None | frc2849 | None | None | None | None | Maryland State Department of Education / Varin... | Ursa Major | 21046 | 2009 | Hammond High School | Maryland | 2849 | http://www.hammondursamajor.org/ursamajor2849/ |
| 10 | None | Washington | USA | None | None | frc2900 | None | None | None | None | DCPS/Google/United Therapeutics&School Without... | The Mighty Penguins | 20037 | 2009 | School Without Walls Shs | District of Columbia | 2900 | http://www.firstinspires.org/ |
| 11 | None | Washington | USA | None | None | frc2912 | None | None | None | None | Google / Bechtel / DCPS CTE & Phelps Ace High ... | Panther Robotics | 20002 | 2009 | Phelps Ace High School | District of Columbia | 2912 | http://www.firstinspires.org/ |
| 12 | None | Washington | USA | None | None | frc2914 | None | None | None | None | Bechtel/Amazon/NASA&Woodrow Wilson Senior High... | TIGER PRIDE | 20016 | 2009 | Woodrow Wilson Senior High Sch | District of Columbia | 2914 | https://www.wilsonrobotics.net |
| 13 | None | Salisbury | USA | None | None | frc3389 | None | None | None | None | Wicomico County Robotics Club / NASA- Wallops ... | TEC Tigers | 21804 | 2010 | Parkside High School - CTE & Parkside High School | Maryland | 3389 | https://wicomicocountyroboticsclub.weebly.com |
| 14 | None | La Plata | USA | None | None | frc3650 | None | None | None | None | Department of Defense, STEM / Navy Surface War... | RoboRaptors | 20646 | 2011 | St Charles High School & North Pt Hs-Sci Tech ... | Maryland | 3650 | http://www.firstinspires.org/ |
| 15 | None | Frederick | USA | None | None | frc3793 | None | None | None | None | Bechtel/Lockheed Martin/Leidos/FCPS MD&Middlet... | CyberTitans | 21703 | 2011 | Middletown High School & Tuscarora High School | Maryland | 3793 | http://cybertitans3793.com |
| 16 | None | Washington | USA | None | None | frc4456 | None | None | None | None | Edison Electrical/Leidos&St John's College Hig... | Mech Cadets | 20015 | 2013 | St John's College High School | District of Columbia | 4456 | https://frc4456.com/ |
| 17 | None | Laurel | USA | None | None | frc4464 | None | None | None | None | Chesapeake Lighthouse Foundation/MSBR/Abbott&C... | Team Illusion | 20707 | 2013 | Chesapeake Math & It Pc-N-Ms | Maryland | 4464 | http://www.teamillusion4464.com/ |
| 18 | None | Woodbridge | USA | None | None | frc4472 | None | None | None | None | Lockheed Martin/Micron Technology/Raytheon Tec... | SuperNOVA | 22192 | 2013 | Family/Community | Virginia | 4472 | http://4472supernova.org/ |
| 19 | None | Silver Spring | USA | None | None | frc449 | None | None | None | None | Intelligent Automation Inc./MBHS Magnet Founda... | The Blair Robot Project | 20901 | 2000 | Montgomery Blair High School | Maryland | 449 | https://robot.mbhs.edu/ |
| 20 | None | Huntingtown | USA | None | None | frc4514 | None | None | None | None | DoDSTEM/Booz-Allen-Hamilton/Calvert Help Assoc... | Calvert STEAM Works | 20639 | 2013 | Northern High School & Huntingtown High School | Maryland | 4514 | https://sites.google.com/view/steamworks4514 |
| 21 | None | Washington | USA | None | None | frc4821 | None | None | None | None | United Therapeutics/FIRST Chesapeake/ Capital ... | cyberUs | 20011 | 2013 | District of Columbia International School | District of Columbia | 4821 | http://www.cyberus4821.weebly.com |
| 22 | None | Riverdale | USA | None | None | frc4949 | None | None | None | None | City of College Park, MD / DoDSTEM / Leidos / ... | Robo Panthers | 20737 | 2014 | Parkdale High School | Maryland | 4949 | http:///www.phsrobopanthers.org |
| 23 | None | Silver Spring | USA | None | None | frc5115 | None | None | None | None | Montgomery County Public Schools/GEICO Informa... | Knight Riders | 20906 | 2014 | Wheaton Senior High School | Maryland | 5115 | https://wheatonrobotics.org/ |
| 24 | None | Clifton | USA | None | None | frc5243 | None | None | None | None | Leidos/IBM/George Mason University/TapHere! Te... | Aegis Robotics | 20124 | 2014 | Centreville High School | Virginia | 5243 | http://www.centrevillerobotics.net/ |
| 25 | None | Alexandria | USA | None | None | frc5587 | None | None | None | None | DoD STEM/Google/Boeing/Raytheon/Comcast/Intuit... | Titan Robotics | 22302 | 2015 | Alexandria City High School | Virginia | 5587 | https://frc5587.org |
| 26 | None | Gambrills | USA | None | None | frc5830 | None | None | None | None | Trek Networks/Maryland Space Business Roundtab... | LIFE Engineering | 21054 | 2016 | Home School | Maryland | 5830 | http://www.team5830.org |
| 27 | None | Frederick | USA | None | None | frc5841 | None | None | None | None | Maryland State Department of Education/Lockhee... | The Patriots | 21701 | 2016 | Gov Thomas Johnson High School | Maryland | 5841 | http://www.firstinspires.org/ |
| 28 | None | Fulton | USA | None | None | frc5945 | None | None | None | None | Greenebaum Enterprises/Eaton/W R Grace/Hackgro... | |CTRL| (Absolute Control) | 20759 | 2016 | Family/Community | Maryland | 5945 | http://frc.thehackground.org |
| 29 | None | Bowie | USA | None | None | frc6213 | None | None | None | None | Patient First/uBreakiFix/SnapMobile/Argosy/Lei... | Team Quantum | 20715 | 2016 | Bowie High School | Maryland | 6213 | http:///www.teamquantum#6213.org |
| 30 | None | Capitol Heights | USA | None | None | frc6239 | None | None | None | None | Maryland Space Business Roundtable/Omnyon/Cere... | The Irrational Engineers | 20715 | 2016 | Family/Community | Maryland | 6239 | http://www.theirrationalengineers.com/ |
| 31 | None | Baltimore | USA | None | None | frc6326 | None | None | None | None | Galen Robotics/The Abell Foundation&Northeast ... | ⚡ Baltimore Bolts ⚡ | 21230 | 2017 | Baltimore City College HS 480 & Northeast Seni... | Maryland | 6326 | http://www.baltimorebolts.com |
| 32 | None | Thurmont | USA | None | None | frc686 | None | None | None | None | Maryland State Department of Education/Bechtel... | Bovine Intervention | 21788 | 2001 | Catoctin High School & Linganore High School &... | Maryland | 686 | https://sites.google.com/view/firstteam686/abo... |
| 33 | None | Greenbelt | USA | None | None | frc6893 | None | None | None | None | MASER/Washington Academy of Sciences/Sigma Xi/... | Bladerunners | 20710 | 2018 | Family/Community | Maryland | 6893 | http:///www.maserdc.org |
| 34 | None | Washington | USA | None | None | frc7714 | None | None | None | None | Cardozo Education Campus | RedRoad | 20009 | 2019 | Cardozo Education Campus | District of Columbia | 7714 | None |
| 35 | None | Bel Air | USA | None | None | frc7770 | None | None | None | None | Family/Community | Infinite Voltage | 21014 | 2019 | Family/Community | Maryland | 7770 | None |
From the DataFrame above, we can see an abundance of information. A lot of this I would consider as unnecessary, except the key, so that I can identify the team, as well as the rookie year of the team -- there might be a relationship between team age and scores/winning. I decided to keep these two columns, but drop everything else.
teams_at_event = teams_at_event[['key', 'rookie_year']]
teams_at_event
| key | rookie_year | |
|---|---|---|
| 0 | frc1111 | 2003 |
| 1 | frc1123 | 2003 |
| 2 | frc1389 | 2004 |
| 3 | frc1446 | 2004 |
| 4 | frc1727 | 2006 |
| 5 | frc1885 | 2006 |
| 6 | frc1915 | 2006 |
| 7 | frc2186 | 2007 |
| 8 | frc2537 | 2008 |
| 9 | frc2849 | 2009 |
| 10 | frc2900 | 2009 |
| 11 | frc2912 | 2009 |
| 12 | frc2914 | 2009 |
| 13 | frc3389 | 2010 |
| 14 | frc3650 | 2011 |
| 15 | frc3793 | 2011 |
| 16 | frc4456 | 2013 |
| 17 | frc4464 | 2013 |
| 18 | frc4472 | 2013 |
| 19 | frc449 | 2000 |
| 20 | frc4514 | 2013 |
| 21 | frc4821 | 2013 |
| 22 | frc4949 | 2014 |
| 23 | frc5115 | 2014 |
| 24 | frc5243 | 2014 |
| 25 | frc5587 | 2015 |
| 26 | frc5830 | 2016 |
| 27 | frc5841 | 2016 |
| 28 | frc5945 | 2016 |
| 29 | frc6213 | 2016 |
| 30 | frc6239 | 2016 |
| 31 | frc6326 | 2017 |
| 32 | frc686 | 2001 |
| 33 | frc6893 | 2018 |
| 34 | frc7714 | 2019 |
| 35 | frc7770 | 2019 |
Now that I have the teams at the event, I want to collect data about each match. This involves another API call.
final = []
match_request = requests.get("https://www.thebluealliance.com/api/v3/event/2019mdbet/matches", headers = headers)
json = match_request.json()
matches = pd.DataFrame.from_dict(json)
matches
| actual_time | alliances | comp_level | event_key | key | match_number | post_result_time | predicted_time | score_breakdown | set_number | time | videos | winning_alliance | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1552248703 | {'blue': {'dq_team_keys': [], 'score': 80, 'su... | f | 2019mdbet | 2019mdbet_f1m1 | 1 | 1552248893 | 1552248722 | {'blue': {'adjustPoints': 0, 'autoPoints': 12,... | 1 | 1552248360 | [{'key': 'jpggyoCj7fc', 'type': 'youtube'}] | blue |
| 1 | 1552249441 | {'blue': {'dq_team_keys': [], 'score': 76, 'su... | f | 2019mdbet | 2019mdbet_f1m2 | 2 | 1552249626 | 1552249562 | {'blue': {'adjustPoints': 0, 'autoPoints': 15,... | 1 | 1552248780 | [{'key': 'Ai35S_iX0wI', 'type': 'youtube'}] | red |
| 2 | 1552250204 | {'blue': {'dq_team_keys': [], 'score': 74, 'su... | f | 2019mdbet | 2019mdbet_f1m3 | 3 | 1552250460 | 1552250283 | {'blue': {'adjustPoints': 0, 'autoPoints': 12,... | 1 | 1552249200 | [{'key': 'Q1EAiCSyLqc', 'type': 'youtube'}] | blue |
| 3 | 1552240516 | {'blue': {'dq_team_keys': [], 'score': 24, 'su... | qf | 2019mdbet | 2019mdbet_qf1m1 | 1 | 1552240692 | 1552240564 | {'blue': {'adjustPoints': 0, 'autoPoints': 12,... | 1 | 1552240800 | [{'key': 'F189-5URbTU', 'type': 'youtube'}] | red |
| 4 | 1552242896 | {'blue': {'dq_team_keys': [], 'score': 41, 'su... | qf | 2019mdbet | 2019mdbet_qf1m2 | 2 | 1552243137 | 1552243024 | {'blue': {'adjustPoints': 0, 'autoPoints': 12,... | 1 | 1552242480 | [{'key': 's2-2HVQeIdE', 'type': 'youtube'}] | red |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 83 | 1552245064 | {'blue': {'dq_team_keys': [], 'score': 64, 'su... | sf | 2019mdbet | 2019mdbet_sf1m1 | 1 | 1552245241 | 1552245003 | {'blue': {'adjustPoints': 0, 'autoPoints': 15,... | 1 | 1552245840 | [{'key': 'g1G_BORXPVA', 'type': 'youtube'}] | blue |
| 84 | 1552246734 | {'blue': {'dq_team_keys': [], 'score': 44, 'su... | sf | 2019mdbet | 2019mdbet_sf1m2 | 2 | 1552246917 | 1552246804 | {'blue': {'adjustPoints': 0, 'autoPoints': 15,... | 1 | 1552246680 | [{'key': 'OeUp7Sv3ULA', 'type': 'youtube'}] | red |
| 85 | 1552247740 | {'blue': {'dq_team_keys': [], 'score': 59, 'su... | sf | 2019mdbet | 2019mdbet_sf1m3 | 3 | 1552247911 | 1552247706 | {'blue': {'adjustPoints': 0, 'autoPoints': 9, ... | 1 | 1552247520 | [{'key': 'GhCt7EQadKQ', 'type': 'youtube'}] | blue |
| 86 | 1552245866 | {'blue': {'dq_team_keys': [], 'score': 78, 'su... | sf | 2019mdbet | 2019mdbet_sf2m1 | 1 | 1552246045 | 1552245842 | {'blue': {'adjustPoints': 0, 'autoPoints': 15,... | 2 | 1552246260 | [{'key': 'oZN0YtEyoVk', 'type': 'youtube'}] | blue |
| 87 | 1552247203 | {'blue': {'dq_team_keys': [], 'score': 79, 'su... | sf | 2019mdbet | 2019mdbet_sf2m2 | 2 | 1552247378 | 1552247183 | {'blue': {'adjustPoints': 0, 'autoPoints': 12,... | 2 | 1552247100 | [{'key': 'AG4tsEDb6aU', 'type': 'youtube'}] | blue |
88 rows × 13 columns
From this printout and the API documentation, there's two important things to notice: 1) The information about match scores are nested in another structure under the "alliances" column which will involve separating that apart further to access that data. 2) The match data above includes information about playoff matches as well (quarterfinals, semifinals, and finals), which is not useful for the scope I've chosen, analyzing just the qualification match data.
Based on these two observations, I first decided to remove all playoff data.
# Removing any playoff match data
qualifications = matches[~matches['key'].str.contains("_f|_qf|_sf")]
After removing playoff data, I began to dissect the scores.
# Grabbing the name of the event (for later use)
event_name = qualifications.iat[0,3]
qualification_scores = qualifications[['alliances']]
qualification_scores
| alliances | |
|---|---|
| 11 | {'blue': {'dq_team_keys': [], 'score': 30, 'su... |
| 12 | {'blue': {'dq_team_keys': [], 'score': 23, 'su... |
| 13 | {'blue': {'dq_team_keys': [], 'score': 27, 'su... |
| 14 | {'blue': {'dq_team_keys': [], 'score': 18, 'su... |
| 15 | {'blue': {'dq_team_keys': [], 'score': 32, 'su... |
| ... | ... |
| 78 | {'blue': {'dq_team_keys': [], 'score': 64, 'su... |
| 79 | {'blue': {'dq_team_keys': [], 'score': 71, 'su... |
| 80 | {'blue': {'dq_team_keys': [], 'score': 44, 'su... |
| 81 | {'blue': {'dq_team_keys': [], 'score': 50, 'su... |
| 82 | {'blue': {'dq_team_keys': [], 'score': 11, 'su... |
72 rows × 1 columns
Even though I removed playoff data, I still want to keep information about what teams won at the event, to identify any possible correlations between qualification performance and winning. This information was not provided in this API call, so I needed to make another API call that would contain this data (namely, in the Awards API call).
winner_request = requests.get("https://www.thebluealliance.com/api/v3/event/2019mdbet/awards", headers = headers)
json = winner_request.json()
awards = pd.DataFrame.from_dict(json)
winners = pd.DataFrame((awards[awards['name'] == "District Event Winner"])['recipient_list'].tolist())
# Since there can be between 3-4 winners per event, this code block breaks up the winners provided from the above line into individual columns, and then from there,
# I gathered all of the keys of the winning teams.
winner_list = []
for index, column in winners.iteritems():
for awardee, team in enumerate(column):
winner_list.append(team['team_key'])
winner_list
['frc1885', 'frc449', 'frc2849']
After gathering winners, I then continued to dissect qualification match scores. The API call split the scores based on the two alliances, so I took a further step and created two DataFrames to handle the two alliances separately.
blue = pd.DataFrame((pd.DataFrame(qualification_scores['alliances'].tolist()))['blue'].tolist())
red = pd.DataFrame((pd.DataFrame(qualification_scores['alliances'].tolist()))['red'].tolist())
blue, red
( dq_team_keys score surrogate_team_keys team_keys
0 [] 30 [] [frc2186, frc2900, frc1727]
1 [] 23 [] [frc1446, frc1885, frc2537]
2 [] 27 [] [frc5830, frc6893, frc1123]
3 [] 18 [] [frc6326, frc2914, frc4456]
4 [] 32 [] [frc686, frc4514, frc5115]
.. ... ... ... ...
67 [] 64 [] [frc2900, frc3793, frc1885]
68 [] 71 [] [frc4472, frc5587, frc6326]
69 [] 44 [] [frc1123, frc1111, frc6213]
70 [] 50 [] [frc4949, frc5115, frc1389]
71 [] 11 [] [frc6213, frc7714, frc3389]
[72 rows x 4 columns],
dq_team_keys score surrogate_team_keys team_keys
0 [] 47 [] [frc5841, frc686, frc1885]
1 [] 62 [] [frc3650, frc686, frc2912]
2 [] 26 [] [frc7770, frc1915, frc1111]
3 [] 23 [] [frc4464, frc2900, frc5587]
4 [] 21 [] [frc5841, frc6213, frc449]
.. ... ... ... ...
67 [] 38 [] [frc5115, frc3650, frc2914]
68 [] 53 [] [frc1727, frc5243, frc5841]
69 [] 36 [] [frc4821, frc2186, frc1915]
70 [] 45 [] [frc6239, frc5243, frc4472]
71 [] 32 [] [frc449, frc2849, frc3793]
[72 rows x 4 columns])
Now, putting it all together: I looped through every team and matched up their scores for each match they played, what alliance they were on for that match, and if they ultimately won the event.
final = []
for index1, row1 in teams_at_event.iterrows():
team = row1['key']
# Seeing if the team was on the Blue Alliance for the match
for index, row in blue.iterrows():
if team in row['team_keys']:
if team in winner_list:
final.append([team, row1['rookie_year'], row['score'], event_name, "blue", True])
else:
final.append([team, row1['rookie_year'], row['score'], event_name, "blue", False])
# Seeing if the team was on the Red Alliance for the match
for index, row in red.iterrows():
if team in row['team_keys']:
if team in winner_list:
final.append([team, row1['rookie_year'], row['score'], event_name, "red", True])
else:
final.append([team, row1['rookie_year'], row['score'], event_name, "red", False])
all_matches = pd.DataFrame (final, columns = ['team_name', "rookie_year", "score", "event_code", "alliance_color", "won_event"])
all_matches
| team_name | rookie_year | score | event_code | alliance_color | won_event | |
|---|---|---|---|---|---|---|
| 0 | frc1111 | 2003 | 29 | 2019mdbet | blue | False |
| 1 | frc1111 | 2003 | 36 | 2019mdbet | blue | False |
| 2 | frc1111 | 2003 | 54 | 2019mdbet | blue | False |
| 3 | frc1111 | 2003 | 29 | 2019mdbet | blue | False |
| 4 | frc1111 | 2003 | 42 | 2019mdbet | blue | False |
| ... | ... | ... | ... | ... | ... | ... |
| 427 | frc7770 | 2019 | 42 | 2019mdbet | red | False |
| 428 | frc7770 | 2019 | 41 | 2019mdbet | red | False |
| 429 | frc7770 | 2019 | 83 | 2019mdbet | red | False |
| 430 | frc7770 | 2019 | 45 | 2019mdbet | red | False |
| 431 | frc7770 | 2019 | 54 | 2019mdbet | red | False |
432 rows × 6 columns
After compiling all of the match data together, I added one more column for the 3 additional statistics that the TBA API provided, which also required another API call: 1) Offensive Power Rating (OPR): A measure of how many points (on average) an individual team contributes to the overall score of each match (higher score is better). 2) Defensive Power Rating (DPR): A measure of how defensive a robot is (lower score is better). 3) Calculated Contribution to Winning Margin (CCWM): A measure of how impactful a team is toward helping the alliance win a match (higher score is better).
stats = requests.get("https://www.thebluealliance.com/api/v3/event/2019mdbet/oprs", headers = headers)
json = stats.json()
oprs = pd.DataFrame.from_dict(json)
oprs
| ccwms | dprs | oprs | |
|---|---|---|---|
| frc1111 | -3.961947 | 14.576206 | 10.614260 |
| frc1123 | 8.115785 | 9.208252 | 17.324036 |
| frc1389 | -4.272037 | 15.681795 | 11.409758 |
| frc1446 | -5.162219 | 13.027866 | 7.865647 |
| frc1727 | 17.734509 | 8.645342 | 26.379851 |
| frc1885 | 6.609230 | 15.558880 | 22.168110 |
| frc1915 | 0.739884 | 5.308569 | 6.048453 |
| frc2186 | -15.523956 | 20.251473 | 4.727517 |
| frc2537 | -12.371876 | 18.952392 | 6.580516 |
| frc2849 | 2.596591 | 8.860362 | 11.456953 |
| frc2900 | -3.647268 | 8.750373 | 5.103105 |
| frc2912 | 11.489318 | 8.845674 | 20.334992 |
| frc2914 | 2.306202 | 10.556711 | 12.862913 |
| frc3389 | 4.084222 | 7.150827 | 11.235049 |
| frc3650 | -10.128670 | 17.820775 | 7.692105 |
| frc3793 | 14.528858 | 10.152337 | 24.681195 |
| frc4456 | 6.401417 | 11.303293 | 17.704710 |
| frc4464 | -2.265041 | 11.673013 | 9.407972 |
| frc4472 | -1.487262 | 23.822118 | 22.334856 |
| frc449 | 1.735295 | 13.983416 | 15.718712 |
| frc4514 | 8.115019 | 9.552085 | 17.667104 |
| frc4821 | 9.727078 | 8.037392 | 17.764470 |
| frc4949 | -3.317042 | 7.749086 | 4.432044 |
| frc5115 | -2.545746 | 14.725956 | 12.180209 |
| frc5243 | -7.922654 | 17.923467 | 10.000813 |
| frc5587 | 7.753885 | 13.965552 | 21.719436 |
| frc5830 | -1.711540 | 14.820681 | 13.109141 |
| frc5841 | -9.606052 | 16.133827 | 6.527775 |
| frc5945 | -4.230201 | 12.911656 | 8.681455 |
| frc6213 | -0.194984 | 3.761038 | 3.566054 |
| frc6239 | 1.507788 | 10.640921 | 12.148709 |
| frc6326 | -9.716943 | 16.102791 | 6.385848 |
| frc686 | 1.069198 | 17.151940 | 18.221138 |
| frc6893 | -4.061566 | 11.713810 | 7.652244 |
| frc7714 | -8.525617 | 12.449991 | 3.924374 |
| frc7770 | 6.138342 | 12.646798 | 18.785140 |
for index, row in oprs.iterrows():
oprs.at[index, 'team_name'] = index
all_matches = pd.merge(all_matches, oprs, on='team_name')
all_matches
| team_name | rookie_year | score | event_code | alliance_color | won_event | ccwms | dprs | oprs | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | frc1111 | 2003 | 29 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.61426 |
| 1 | frc1111 | 2003 | 36 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.61426 |
| 2 | frc1111 | 2003 | 54 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.61426 |
| 3 | frc1111 | 2003 | 29 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.61426 |
| 4 | frc1111 | 2003 | 42 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.61426 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 427 | frc7770 | 2019 | 42 | 2019mdbet | red | False | 6.138342 | 12.646798 | 18.78514 |
| 428 | frc7770 | 2019 | 41 | 2019mdbet | red | False | 6.138342 | 12.646798 | 18.78514 |
| 429 | frc7770 | 2019 | 83 | 2019mdbet | red | False | 6.138342 | 12.646798 | 18.78514 |
| 430 | frc7770 | 2019 | 45 | 2019mdbet | red | False | 6.138342 | 12.646798 | 18.78514 |
| 431 | frc7770 | 2019 | 54 | 2019mdbet | red | False | 6.138342 | 12.646798 | 18.78514 |
432 rows × 9 columns
Thus, this completes the data curation for the 2019mdbet event. We now have each team that competed in this event, their rookie year, their score for each qualification match played, whether they won the 2019mdbet event, and their different statisic values. I will now repeat this process for all of the other events.
district_events = district_events.drop(district_events.index[1])
district_events
| 0 | |
|---|---|
| 0 | 2019chcmp |
| 2 | 2019mdowi |
| 3 | 2019mdoxo |
| 4 | 2019vabla |
| 5 | 2019vagle |
| 6 | 2019vahay |
| 7 | 2019vapor |
for index, event in district_events.iterrows():
event_name = event[0]
# Team Curation
url = "https://www.thebluealliance.com/api/v3/event/" + event_name + "/teams"
teams_request = requests.get(url, headers = headers)
json = teams_request.json()
teams_at_event = pd.DataFrame.from_dict(json)
teams_at_event = teams_at_event[['key', 'rookie_year']]
# Grabbing match data
final = []
url = "https://www.thebluealliance.com/api/v3/event/" + event_name + "/matches"
match_request = requests.get(url, headers = headers)
json = match_request.json()
matches = pd.DataFrame.from_dict(json)
qualifications = matches[~matches['key'].str.contains("_f|_qf|_sf")]
qualification_scores = qualifications[['alliances']]
# Grabbing winners at each event -- the championship event had a different parameter in the API to be searched on,
# so that distinction was made here.
if event_name == "2019chcmp":
url = "https://www.thebluealliance.com/api/v3/event/" + event_name + "/awards"
winner_request = requests.get(url, headers = headers)
json = winner_request.json()
awards = pd.DataFrame.from_dict(json)
winners = pd.DataFrame((awards[awards['name'] == "District Championship Winner"])['recipient_list'].tolist())
winner_list = []
for index, column in winners.iteritems():
for awardee, team in enumerate(column):
winner_list.append(team['team_key'])
else:
url = "https://www.thebluealliance.com/api/v3/event/" + event_name + "/awards"
winner_request = requests.get(url, headers = headers)
json = winner_request.json()
awards = pd.DataFrame.from_dict(json)
winners = pd.DataFrame((awards[awards['name'] == "District Event Winner"])['recipient_list'].tolist())
winner_list = []
for index, column in winners.iteritems():
for awardee, team in enumerate(column):
winner_list.append(team['team_key'])
# Separating match data by alliance for score pulling
blue = pd.DataFrame((pd.DataFrame(qualification_scores['alliances'].tolist()))['blue'].tolist())
red = pd.DataFrame((pd.DataFrame(qualification_scores['alliances'].tolist()))['red'].tolist())
for index1, row1 in teams_at_event.iterrows():
team = row1['key']
for index, row in blue.iterrows():
if team in row['team_keys']:
if team in winner_list:
final.append([team, row1['rookie_year'], row['score'], event_name, "blue", True])
else:
final.append([team, row1['rookie_year'], row['score'], event_name, "blue", False])
for index, row in red.iterrows():
if team in row['team_keys']:
if team in winner_list:
final.append([team, row1['rookie_year'], row['score'], event_name, "red", True])
else:
final.append([team, row1['rookie_year'], row['score'], event_name, "red", False])
compiled = pd.DataFrame (final, columns = ['team_name', "rookie_year", "score", "event_code", "alliance_color", "won_event"])
# Grabbing OPRs and other statistics, and combining with existing data for 2019mdbet
url = "https://www.thebluealliance.com/api/v3/event/" + event_name + "/oprs"
stats = requests.get(url, headers = headers)
json = stats.json()
oprs = pd.DataFrame.from_dict(json)
for index, row in oprs.iterrows():
oprs.at[index, 'team_name'] = index
compiled = pd.merge(compiled, oprs, on='team_name')
all_matches = pd.concat([all_matches, compiled])
all_matches = all_matches.reset_index()
all_matches
| index | team_name | rookie_year | score | event_code | alliance_color | won_event | ccwms | dprs | oprs | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | frc1111 | 2003 | 29 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.614260 |
| 1 | 1 | frc1111 | 2003 | 36 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.614260 |
| 2 | 2 | frc1111 | 2003 | 54 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.614260 |
| 3 | 3 | frc1111 | 2003 | 29 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.614260 |
| 4 | 4 | frc1111 | 2003 | 42 | 2019mdbet | blue | False | -3.961947 | 14.576206 | 10.614260 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3799 | 439 | frc977 | 2002 | 53 | 2019vapor | red | False | 3.722826 | 21.191992 | 24.914817 |
| 3800 | 440 | frc977 | 2002 | 60 | 2019vapor | red | False | 3.722826 | 21.191992 | 24.914817 |
| 3801 | 441 | frc977 | 2002 | 59 | 2019vapor | red | False | 3.722826 | 21.191992 | 24.914817 |
| 3802 | 442 | frc977 | 2002 | 59 | 2019vapor | red | False | 3.722826 | 21.191992 | 24.914817 |
| 3803 | 443 | frc977 | 2002 | 68 | 2019vapor | red | False | 3.722826 | 21.191992 | 24.914817 |
3804 rows × 10 columns
With the data we've collected so far, let's take a look at what scores at each event looked like:
plt.rcParams["figure.figsize"] = (10,10)
for event in all_matches['event_code'].unique():
temp = all_matches.loc[all_matches['event_code'] == event]
for team in temp['team_name'].unique():
temp2 = temp[temp['team_name'] == team]
match_counter = 1
matches = []
score = []
for index, match in temp2.iterrows():
matches.append(match_counter)
score.append(match['score'])
match_counter = match_counter + 1
plt.plot(matches, score, label = team)
plt.title("Qualification Scores for the teams at " + event)
plt.xlabel("Match Played")
plt.ylabel("Score from Match")
plt.show()
From the jumble of lines for each event, there is very little we can tell. For most events, it seems like the average score per qualification match ranged from 40-60 points, and points of extreme jumps were seen for many teams across matches. At the district championship (2019chcmp), 2 teams achieved season high scores of 110+, which tracks for the caliber of play expected at district championships. To get a more meaningful breakdown of this data, we should filter based on one feature. Logically, I would choose the winning teams per event, as it's a much smaller subset to analyze.
From the curated data, one major aspect I'm interested to see is how the age of a team influences the chance of winning at an event. Age of a team can be explored through win rate, as well as how the additional statistics vary among the winning teams. But first:
While the general thought pattern might be to consider that the 3-4 teams that win at each event might consistently have high scores throughout qualifications might be instinct, this is not necessarily true. During alliance selection, the Top 8 teams choose their first alliance member going in order from Rank 1 to Rank 8, and then the second alliance member is chosen going in reverse order from Rank 8 to Rank 1. There are also some underlying factors that may not be captured by the collected data -- whether some teams have strong friendships with each other, or if a team was purely unlucky with their match schedule and paired with underperforming robots, driving their rankings and statistics down when in reality they might be a strong sleeper pick.
Regardless, I think it's important to look at the score data for the different winners and analyze what's there.
for event in all_matches['event_code'].unique():
temp = all_matches.loc[all_matches['event_code'] == event]
# Pulling out teams that won at the event
temp2 = temp[temp['won_event'] == True]
for team in temp2['team_name'].unique():
temp3 = temp[temp['team_name'] == team]
match_counter = 1
matches = []
score = []
# Plotting each match the team played and the score for that match
for index, match in temp3.iterrows():
matches.append(match_counter)
score.append(match['score'])
match_counter = match_counter + 1
plt.plot(matches, score, label = team)
plt.legend(loc="upper left")
plt.title("Qualification Scores for the Winning Teams at " + event)
plt.xlabel("Match Played")
plt.ylabel("Score from Match")
plt.show()
From the above graphs, I think it's safe to say that the scores alone do not tell us much, similar to the graphs from the previous step we completed. There is a great amount of variation in scores that occurs across all matches across all events. One interesting thing to note from this is that when teams reach the district championship level, events have already been happening for 5-6 weeks. Most teams have already played at 2 events, some even at 3 events if they pay to play an extra week. Not everyone qualifies for district championships, so you would logically expect that the caliber of play at the district championship level should be higher. This is suprising for the 2019chcmp data, as we can see frc4541 has two drastic drops in their qualification scores, yet they were still able to move onto playoffs and win at the district championship level.
When populating teams that had won at each event, I made sure to avoid duplicates while looping through. Likewise, I made sure to only include teams once, even if they had won at multiple events (which is a possibility). Therefore, while we'd expect 24-32 unique possible winners (8 events, 3 winners best case vs 8 events, 4 winners worst case), it's possible for there to be less than 24 teams in this population.
winning_years = []
winning_teams_unique = []
winning_teams = []
for event in all_matches['event_code'].unique():
temp = all_matches.loc[all_matches['event_code'] == event]
temp2 = temp.drop_duplicates(subset = 'team_name')
for index, row in temp2.iterrows():
if row['won_event'] and (row['team_name'] in winning_teams) == False:
winning_years.append(row['rookie_year'])
winning_teams_unique.append(row['team_name'])
if row['won_event']:
winning_teams.append(row['team_name'])
# Curating the amount of teams that won per Rookie Year
winning_years = pd.value_counts(winning_years)
winning_years = winning_years.to_frame()
winning_teams = pd.value_counts(winning_teams)
for index, row in winning_years.iterrows():
winning_years.at[index, 'year'] = index
winning_years['year'] = winning_years['year'].astype(int)
winning_years = winning_years.rename(columns = {0 :'value'})
winning_years = winning_years.sort_values(by=['year'])
winning_years
| value | year | |
|---|---|---|
| 2000 | 3 | 2000 |
| 2001 | 4 | 2001 |
| 2002 | 1 | 2002 |
| 2004 | 2 | 2004 |
| 2005 | 2 | 2005 |
| 2006 | 2 | 2006 |
| 2008 | 1 | 2008 |
| 2009 | 1 | 2009 |
| 2010 | 1 | 2010 |
| 2011 | 1 | 2011 |
| 2013 | 1 | 2013 |
| 2017 | 1 | 2017 |
| 2018 | 1 | 2018 |
winning_teams
frc612 2 frc619 2 frc6882 2 frc346 2 frc6543 1 frc1599 1 frc2363 1 frc1731 1 frc539 1 frc1262 1 frc3274 1 frc1885 1 frc2849 1 frc1418 1 frc836 1 frc614 1 frc3748 1 frc4541 1 frc401 1 frc449 1 frc1610 1 dtype: int64
From the above, we can see how there were only 21 unique winning teams, with 4 teams winning at 2 district events. Therefore, it's important that we remove these duplicates in our analysis.
winning_years.plot.bar(x='year', y='value', rot=0, title = "Number of Winning Teams per Rookie Year")
<AxesSubplot:title={'center':'Number of Winning Teams per Rookie Year'}, xlabel='year'>
From this, we can see how 7/21, or 1/3, of winners that won events in 2019 were founded in 2000 or 2001. Furthermore, half of the teams that won were founded at least 13 years before the 2019 season occurred. Looking at the 5 years prior to the 2019 season, only 2 teams in that time frame won at an event in 2019. This could pose an early indicator that the older a team is, the more likely they are to win at an event.
opr = []
dpr = []
ccwm = []
years = []
winning_teams = []
for event in all_matches['event_code'].unique():
temp = all_matches.loc[all_matches['event_code'] == event]
temp2 = temp.drop_duplicates(subset = 'team_name')
for index, row in temp2.iterrows():
if row['won_event'] and (row['team_name'] in winning_teams) == False:
years.append(row['rookie_year'])
opr.append(row['oprs'])
dpr.append(row['dprs'])
ccwm.append(row['ccwms'])
zipped = list(zip(years, opr, dpr, ccwm))
statistics = pd.DataFrame(zipped, columns = ['Year', 'OPR', 'DPR', 'CCWM'])
seaborn.scatterplot(x="Year",
y="OPR",
data=statistics).set(title = "OPR vs Rookie Year of Winning Teams in 2019")
[Text(0.5, 1.0, 'OPR vs Rookie Year of Winning Teams in 2019')]
Looking at OPR vs Rookie Year, we can see a general downward trend in OPR for newer teams, indicating that older teams are able to provide more points per match during qualifications, and logically, into playoff matches as well. We can see how drastic this trend is if we fit a line to our scatterplot:
slope, intercept, r_value, pv, se = stats.linregress(statistics['Year'],statistics['OPR'])
print(slope)
-0.9723503931576458
print(intercept)
1972.2223485557051
r_value
-0.7043580944090574
From the above regression, we can see that the OPR vs Year yields a regression equation of y = -0.972x + 1972.222, where x = the rookie year of the team. This regression also has an R value of -0.704, which statistically indicates that this relationship is significant.
seaborn.scatterplot(x="Year",
y="DPR",
data=statistics).set(title = "DPR vs Rookie Year of Winning Teams in 2019")
[Text(0.5, 1.0, 'DPR vs Rookie Year of Winning Teams in 2019')]
Compared to the OPR vs Year scatter plot, this plot has a more randomized spread, and does not really speak to whether there's any significant relationship between DPR and Rookie Year of winning teams.
seaborn.scatterplot(x="Year",
y="CCWM",
data=statistics).set(title = "CCWM vs Rookie Year of Winning Teams in 2019")
[Text(0.5, 1.0, 'CCWM vs Rookie Year of Winning Teams in 2019')]
Similar to the OPR vs Rookie Year plot from above, we can see a general downward trend in CCWM for newer teams, indicating that older teams are able to be more impactful to their alliance during qualifications, and logically, into playoff matches as well. We can see how drastic this trend is if we fit a line to our scatterplot:
slope, intercept, r_value, pv, se = stats.linregress(statistics['Year'],statistics['CCWM'])
print(slope)
-0.8721593643810367
print(intercept)
1755.6727851186945
r_value
-0.6213857140427312
Thus from above, we are presented with the regression equation of y = -0.872x + 1755.673, where x is the Rookie Year of the team. This line also has an r-value of -0.621, which isn't necessarily significant enough to suggest that a significant relationship between CCWM and Year.
Based on the above result showing significance for the OPR vs Rookie Year, I decided to create a model to determine how accurate the OPR is in predicting the winner of an event. This task leads us to the next section:
Before starting, it's important to state what my different hypotheses are for this model:
Null Hypothesis: The Rookie Year, Match Score, and OPR is not a good measure to determine whether a team will win at an event.
Alternative Hypothesis: The Rookie Year, Match Score, and OPR is a good measure to determine whether a team will win at an event.
While we saw earlier that there was significance between OPR and the Rookie Year of the winning team, we seek to see how this relationship translates in predicting winners based on these two factors, as well as per match score.
ind_data = all_matches[['rookie_year', 'score', 'oprs']]
ind_data
| rookie_year | score | oprs | |
|---|---|---|---|
| 0 | 2003 | 29 | 10.614260 |
| 1 | 2003 | 36 | 10.614260 |
| 2 | 2003 | 54 | 10.614260 |
| 3 | 2003 | 29 | 10.614260 |
| 4 | 2003 | 42 | 10.614260 |
| ... | ... | ... | ... |
| 3799 | 2002 | 53 | 24.914817 |
| 3800 | 2002 | 60 | 24.914817 |
| 3801 | 2002 | 59 | 24.914817 |
| 3802 | 2002 | 59 | 24.914817 |
| 3803 | 2002 | 68 | 24.914817 |
3804 rows × 3 columns
dep_data = all_matches['won_event']
dep_data
0 False
1 False
2 False
3 False
4 False
...
3799 False
3800 False
3801 False
3802 False
3803 False
Name: won_event, Length: 3804, dtype: bool
Using holdout validation, I created training and test data based on the data I collected to train on my model. Because I was using a classifier model, I decided to test this with two different model types: Decision Trees and Linear Discriminant Analysis. After running both models, I will analyze the accuracy scores of each to determine how significant both models are at predicting winners based on the three provided features.
ind_train, ind_test, dep_train, dep_test = ms.train_test_split(ind_data, dep_data, random_state=13)
decision_tree = DecisionTreeClassifier()
decision_tree = decision_tree.fit(ind_train, dep_train)
dt_predicted = decision_tree.predict(ind_test)
# Accuracy Score for Decision Tree Model
met.accuracy_score(dep_test, dt_predicted)
0.9989484752891693
lda = LinearDiscriminantAnalysis()
lda = lda.fit(ind_train, dep_train)
lda_predicted = lda.predict(ind_test)
# Accuracy Score for the Linear Discriminant Analysis
met.accuracy_score(dep_test, lda_predicted)
0.9263932702418507
From both of the accuracy scores above, it is evident that both models were able to predict winners of events based qualification scores, rookie year, and OPR ratings of teams with more than 92% accuracy. Thus, in terms of my hypothesis, I would conclude that there is significant evidence from the accuracy score to state that the Rookie Year, Match Score, and OPR is a good measure to determine whether a team will win at an event.
There is one important caveat/limitation to note here: The OPR ranking from the TBA API only lists one averaged value per team, not on a per match level. Further analysis and data collection would be needed (if the OPR even existed at a per match level, or manually calculating that value) to determine how a changing OPR value after each match would impact both of the models and their predictive capabilities.
Let's recap what we've seen so far: 1) We started by gathering our relevant match data for the 2019 season from the TBA API 2) We then graphed score data for each event -- which really didn't provide anything of significance to us. To further understand what information the data held, we broke it down further in the exploratory data analysis. 3) Through exploratory data analysis, we analyzed possible trends in winning teams, and reached a conclusion about a relationship between the Rookie Year of a winning team and their OPR. 4) Using this initial conclusion, we trained two models to take in those two features, as well as scores from each qualification match, to create models that would predict whether a team would win the event based on those three features.
We now have a model that works based on data provided from the FIRST Chesapeake District. Further expansion of this model could be to include other districts (i.e. FIRST in Texas, First Mid-Atlantic, FIRST in Michigan, etc.) into our model and see how accurate the model is at predicting for a larger set of data. The basis of this model could also be applied to different seasons, and see how accurate the model is at predicting winners across seasons. This model could also be adapted for a regional system, but the predictive power of it is stronger within a district system. Since teams play multiple events in a district, this model could also be adapted to see how change in OPRs between events impacts a team's chance at winning at their second event.
In general, there's also one thing to note about any FRC data that was used here, and could be used in future iterations/extensions of this model. Within the FRC community, there are many teams that are commonly known to perform well, and consistently win district events/regionals, and some even at the championship level, year after year. Some examples of these teams include FRC254, FRC1678, and FRC1114. An FRC season can be a costly affair -- teams have budgets ranging from a few thousand dollars to amounts well in the six figure range. It's important to note that money does play a factor in how well teams perform -- from what equipment and tools they are able to purchase, to other expenses. Similarly, other resources such as mentor support, building conditions, and student makeup of each team play a strong role in how well teams perform. This may not be entirely captured by the data presented here.